Skip to content

Fix: bug(gateguard): race condition in session state read-modify-write path#1442

Open
shahyashish wants to merge 1 commit intoaffaan-m:mainfrom
shahyashish:gitfix-patch-1441-1776222349787
Open

Fix: bug(gateguard): race condition in session state read-modify-write path#1442
shahyashish wants to merge 1 commit intoaffaan-m:mainfrom
shahyashish:gitfix-patch-1441-1776222349787

Conversation

@shahyashish
Copy link
Copy Markdown

@shahyashish shahyashish commented Apr 15, 2026

The fix addresses a race condition in the session state read-modify-write path by implementing a 'merge-style save' strategy. Previously, saveState would simply overwrite the session file with the current in-memory snapshot, which could lead to data loss if two concurrent hook processes were running for the same session. Now, saveState performs a raw read of the current on-disk state immediately before writing, merges it with the in-memory state (creating a union of 'checked' keys and taking the maximum 'last_active' timestamp), and then performs an atomic write via a temp file. This ensures that checks performed by parallel processes are additive rather than destructive. A random suffix was also added to the temp file path to further reduce collision risks.

Test: 1. Set CLAUDE_SESSION_ID=test_race. 2. Run two instances of the hook script nearly simultaneously, each attempting to markChecked a different file (e.g., file_A and file_B). 3. Verify the resulting session JSON file in the state directory (.gateguard/state-test_race.json) contains both file_A and file_B in its checked array, regardless of which process finished first.

Summary by CodeRabbit

  • Bug Fixes
    • Improved data persistence reliability by enhancing state merging and conflict resolution logic.
    • Strengthened error handling to prevent system blocking during data save operations.
    • Enhanced collision resistance for temporary file generation.

Summary by cubic

Fixes a race condition in GateGuard session saves by merging on-disk and in-memory state before an atomic write, preventing lost checked entries during parallel runs. Fixes #1441.

  • Bug Fixes
    • saveState now does a merge-style save: unions checked and takes the max last_active with the on-disk state.
    • Uses an atomic write via a temp file with PID + random suffix to avoid collisions.

Written for commit 4cf12e8. Summary will update on new commits.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3ed214b-8774-4fcb-b8d7-3391337c4932

📥 Commits

Reviewing files that changed from the base of the PR and between 48a30b5 and 4cf12e8.

📒 Files selected for processing (1)
  • scripts/hooks/gateguard-fact-force.js

📝 Walkthrough

Walkthrough

Updated the saveState function in a git hook to prevent in-place state mutation, instead reading existing disk state and merging it with in-memory state by unioning checked entries and taking the maximum last_active. Enhanced collision resistance with improved temporary filename generation and expanded error handling to ensure the hook never blocks.

Changes

Cohort / File(s) Summary
State Persistence & Merge Logic
scripts/hooks/gateguard-fact-force.js
Modified saveState to read existing on-disk state, merge checked entries via union and last_active via max, use collision-resistant temp filenames with PID and random suffix, and broaden error handling to catch all disk I/O failures without blocking the hook.

Possibly related issues

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A hook that won't block, come what may,
Merging states in a safe, atomic way,
Disk meets memory in harmony true,
No collisions now—just smooth sailing through!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: addressing a race condition in the session state read-modify-write path, which is the core change in the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 15, 2026

Greptile Summary

This PR replaces the simple overwrite in saveState with a merge-style read-modify-write: immediately before writing, the current on-disk state is re-read and unioned with the in-memory state, and a random suffix is added to the temp file name to reduce collision risk. The fix meaningfully narrows the TOCTOU window compared to the original (which read disk at loadState time, far before the eventual write), but a concurrent-read-before-either-writes scenario can still drop entries, so the PR description's "ensures additive" claim is stronger than the code's actual guarantee.

  • P1 – Two processes that both enter saveState's inner readFileSync before either completes renameSync will still clobber each other's new entries; a file-level lock or post-rename re-merge loop is needed to fully close the window.
  • P2 – Orphaned .tmp. files (from mid-write crashes) are silently excluded from pruneStaleFiles because the filter requires the filename to end in .json.
  • No automated test covers the concurrent-write scenario the fix is designed to protect.

Confidence Score: 4/5

Improvement is real but the race condition the PR claims to fix is only narrowed, not eliminated; address before merging if correctness of concurrent writes is a hard requirement.

One P1 finding: the TOCTOU window between the merge-read and the rename persists, meaning the 'ensures additive' guarantee in the PR description does not hold under tight concurrency. The fix is a significant improvement in practice (window shrinks from call-stack duration to disk I/O time), but the stated correctness claim is overstated.

scripts/hooks/gateguard-fact-force.js — specifically the saveState merge window (lines 86–108) and the pruneStaleFiles filter (line 133)

Important Files Changed

Filename Overview
scripts/hooks/gateguard-fact-force.js Merge-style saveState reduces the TOCTOU window but a concurrent read-before-write race survives; orphaned temp files also not cleaned up

Sequence Diagram

sequenceDiagram
    participant PA as Process A (saveState)
    participant FS as File System
    participant PB as Process B (saveState)

    Note over PA,PB: Happy path — fix works (serial reads)
    PA->>FS: readFileSync(STATE_FILE) → {checked:[]}
    PA->>FS: renameSync(tmp_A) → {checked:["file_A"]}
    PB->>FS: readFileSync(STATE_FILE) → {checked:["file_A"]}
    PB->>FS: renameSync(tmp_B) → {checked:["file_A","file_B"]}

    Note over PA,PB: Remaining race — both read before either writes
    PA->>FS: readFileSync(STATE_FILE) → {checked:[]}
    PB->>FS: readFileSync(STATE_FILE) → {checked:[]}
    PA->>FS: renameSync(tmp_A) → {checked:["file_A"]}
    PB->>FS: renameSync(tmp_B) → {checked:["file_B"]}
    Note over FS: file_A silently lost
Loading

Comments Outside Diff (1)

  1. scripts/hooks/gateguard-fact-force.js, line 132-133 (link)

    P2 Orphaned temp files not covered by stale-file cleanup

    The pruneStaleFiles guard skips any filename that doesn't end in .json. The new temp-file name is state-<session>.json.tmp.<pid>.<rand>, which doesn't satisfy f.endsWith('.json'), so a process that crashes between writeFileSync (line 107) and renameSync (line 108) leaves an orphan that is never reaped. Consider broadening the filter or adding a secondary pass:

    // also clean up orphaned temp files
    if (f.startsWith('state-') && f.includes('.tmp.')) {
      const fp = path.join(STATE_DIR, f);
      const stat = fs.statSync(fp);
      if (now - stat.mtimeMs > SESSION_TIMEOUT_MS) fs.unlinkSync(fp);
    }

Reviews (1): Last reviewed commit: "Fix issue #1441: bug(gateguard): race co..." | Re-trigger Greptile

Comment on lines +86 to 108
try {
if (fs.existsSync(STATE_FILE)) {
const disk = JSON.parse(fs.readFileSync(STATE_FILE, 'utf8'));
if (Array.isArray(disk.checked)) {
// Union of checked items, preserving order (disk items first, then memory items)
mergedChecked = Array.from(new Set([...disk.checked, ...mergedChecked]));
}
if (typeof disk.last_active === 'number') {
mergedLastActive = Math.max(mergedLastActive, disk.last_active);
}
}
} catch (_) { /* ignore disk read errors, proceed with memory state */ }

// 3. Finalize state: update heartbeat and prune to bounded size
const finalState = {
checked: pruneCheckedEntries(mergedChecked),
last_active: Math.max(mergedLastActive, Date.now())
};

// 4. Atomic write: temp file + rename prevents partial reads
const tmpFile = STATE_FILE + '.tmp.' + process.pid + '.' + Math.random().toString(36).slice(2, 6);
fs.writeFileSync(tmpFile, JSON.stringify(finalState, null, 2), 'utf8');
fs.renameSync(tmpFile, STATE_FILE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 TOCTOU window still present between merge-read and rename

The merge read (readFileSync on line 88) and the atomic rename (line 108) are separate, non-atomic operations. If two processes both complete their readFileSync before either calls renameSync, the second rename will silently overwrite the first's new entries. The PR description says this "ensures that checks... are additive rather than destructive," but that guarantee only holds when one process completes its rename before the other's inner read — a timing assumption that cannot be relied upon.

Concrete failure path:

  1. Process A reads disk → {checked: []}
  2. Process B reads disk → {checked: []} (before A writes)
  3. Process A renames → ["file_A"] persisted
  4. Process B renames → ["file_B"] persisted, "file_A" silently lost

The window is now the disk I/O span of lines 87–97 rather than the entire loadStatesaveState call chain, which is a meaningful improvement, but the window is not zero. Truly eliminating it would require an OS-level exclusive lock (e.g. lockfile / proper-lockfile) around the read-modify-write, or a retry loop that re-reads after the rename and merges again if the file changed.

}

module.exports = { run };
module.exports = { run }; No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing newline at end of file

The diff shows \ No newline at end of file. POSIX requires text files to end with a newline, and many tools (diffing, cat, linters) behave unexpectedly without one.

Suggested change
module.exports = { run };
module.exports = { run };

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/hooks/gateguard-fact-force.js">

<violation number="1" location="scripts/hooks/gateguard-fact-force.js:88">
P1: Concurrent saveState calls can still lose updates because merge-style read-modify-write is not synchronized across processes.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

// 2. Merge with current disk state to prevent overwriting concurrent updates (Option 2: Merge-style saves)
try {
if (fs.existsSync(STATE_FILE)) {
const disk = JSON.parse(fs.readFileSync(STATE_FILE, 'utf8'));
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Concurrent saveState calls can still lose updates because merge-style read-modify-write is not synchronized across processes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/hooks/gateguard-fact-force.js, line 88:

<comment>Concurrent saveState calls can still lose updates because merge-style read-modify-write is not synchronized across processes.</comment>

<file context>
@@ -76,14 +76,37 @@ function pruneCheckedEntries(checked) {
+    // 2. Merge with current disk state to prevent overwriting concurrent updates (Option 2: Merge-style saves)
+    try {
+      if (fs.existsSync(STATE_FILE)) {
+        const disk = JSON.parse(fs.readFileSync(STATE_FILE, 'utf8'));
+        if (Array.isArray(disk.checked)) {
+          // Union of checked items, preserving order (disk items first, then memory items)
</file context>
Fix with Cubic

@shenchangmin
Copy link
Copy Markdown

@shahyashish — thanks for picking up #1441 so quickly. Heads up per #1453: the merge-style rewrite in your new `saveState` dropped the Windows `EEXIST` / `EPERM` unlink-before-rename branch that currently lives in `main` (landed as part of #1439's GateGuard consolidation). `fs.renameSync` throws `EEXIST` on Windows when the target already exists, so without the unlink fallback the second write onwards silently no-ops inside the catch-all and GateGuard stops seeing checked files.

The branch to preserve:

```js
try {
fs.renameSync(tmpFile, STATE_FILE);
} catch (error) {
if (error && (error.code === 'EEXIST' || error.code === 'EPERM')) {
try { fs.unlinkSync(STATE_FILE); } catch (_) { /* ignore */ }
fs.renameSync(tmpFile, STATE_FILE);
} else {
throw error;
}
}
```

FWIW #1450 hit a parallel shape on the Python side — three Windows-specific pitfalls in a row (launcher-as-cmd-builtin → cmd.exe re-parsing → argv quoting) before landing on `explorer.exe`. Cross-platform edge cases tend to come in layers; might be worth a full Windows smoke pass on the merged state before ship.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(gateguard): race condition in session state read-modify-write path

2 participants